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We consider the situation in which digital data is to be reliably trans- 
mitted over a discrete, memoryless channel (dmc) that is subjected to a 
wire-tap at the receiver. We assume that the wire-tapper views the channel 
output via a second DMC. Encoding by the transmitter and decoding by the 
receiver are permitted. However, the code books used in these operations are 
assumed to be known by the wire-tapper. The designer attempts to build 
the encoder-decoder in such a -way as to maximize the transmission rate R, 
and the equivocation d of the data as seen by the wire-tapper. In this paper, 
we find the trade-off curve between R and d, assuming essentially perfect 
("error-free") transmission. In particular, if d is equal to Hs, the entropy 
of the data source, then we consider that the transmission is accomplished 
in perfect secrecy. Our results imply that there exists a C, > 0, such 
that reliable transmission at rates up to C, is possible in approximately 
perfect secrecy. 

I. INTRODUCTION 

In this paper we study a (perhaps noisy) communication system 
that is being wire-tapped via a second noisy channel. Our object is to 
encode the data in such a way that the wire-tapper's level of confusion 
will be as high as possible. To fix ideas, consider first the simple special 
case depicted in Fig. 1 (in which the main communication system is 
noiseless). The source emits a data sequence Si, S 2 , ■ • •, which consists 
of independent copies of the binary random variable S, where 
Pr{£ = 0}=Pr{£=l}=i The encoder examines the first K 
source bits S K = (Si, ■ ■ ■ , Sr) and encodes S K into a binary N vector 
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Fig. 1 — Wire-tap channel (special case). 



X^ = {Xi, • • • , X N ). X N in turn is transmitted perfectly to the decoder 
via the noiseless channel and is transformed into a binary data stream 
S K = (Si, • • •, S K ) for delivery to the destination. The "error proba- 
bility" is defined as 



P e =^IPr {S k * &] 



(1) 



The entire process is repeated on successive blocks of K source bits. 
The transmission rate is K/N bits per transmitted channel symbol. 

The wire-tapper observes the encoded vector X^ through a (memory- 
less) binary symmetric channel (bsc) with crossover probability 
p (0 < p ^ 5). The corresponding output at the wire-tap is Z N 
= (Zi, • • •, Zir), so that for x, z = 0, 1 (1 ^ n ^ N), 



Pr {Z n = z\X n = x\ = (1 - po)8 x , z + p (l - «..«)• 



We take the equivocation 



A = ^H(S K \Z N ) 



(2) 



as a measure of the degree to which the wire-tapper is confused. The 
logarithms in H are, as are all logarithms in this paper, taken to the 
base 2. The system designer would like to have P e close to zero, with 
K/N and A as large as possible. 
Consider the following schemes : 

(i) Set K = N = 1, and let X x = Si. This results in P e = 0, 
K/N = 1, and A = H(Xi\Zi) = h(p ), where 

h(\) = - X log X - (1 - X) log (1 - X), £ X £ 1, (3) 
(take log = 0). 
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(«) Set K = 1, and let N be arbitrary. Let Co be the subset of 
binary N space, {0, 1 } A ', consisting of those N vectors with even parity 
(i.e., an even number of l's). Let Ci C {0, 1 } N be the subset of vectors 
with odd parity. The encoder works as follows. When Si = i, (i = 0, 1), 
the encoder output X- v is a randomly chosen vector in C,. Thus, the 
encoder is a channel with transition probability 

for i-0,1. Clearly, the decoder can recover Si from X* perfectly, so 
that P e = 0. We now turn to the wire-tapper who observes Z N , the 
output of the bsc corresponding to the input X v . Let z £ {0, 1}- V be 
a vector of, say, even parity. Then 

Pr {Si = 0|Z- V = zj = Pr J the BSC makes an 1 

} even number of errors j 

= E ( N .) pi(l ~ Po)*"'" = h + Hi - 2p )- v . 
y=o \ J / 

j OVPM 

The last equality can be verified by applying the binomial formula to 

[(i - p ) ± xp 9 y = e ( N .)ri(i - vo) N - j (±xy. 

Then 

2 E (^ ) pg(l - Po)*-' = (1 - Po + l-pn) v + (1 - po - l-po) A ' 

> even \ ./ / 

= 1 + (1 - 2p )" 
(S. P. Lloyd). Similarly, for z G {0, 1}* of odd parity, 

the bsc makes an 



Pr{Si = 0|Z" = z| = Pr, 

odd number of errors 

= § - |(1 - 2p )* 
Therefore, for all z G {0, 1} A ', 

ffC&lZ* = z) = AR -|(1 -2p ) Ar l 
so that 

A = H(S t \Z») = hft - HI - 2p )"] 

-*1 = H(Si), as A'->«>. 

Thus, as A r — » co , the equivocation at the wire-tap approaches the 
unconditional source entropy, so that communication is accomplished 
in perfect secrecy. The "catch" is that, as iV— >oc, the transmission 
rate K/N = l/N -* 0. 
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A central question to which this paper is addressed is whether or 
not it is possible to transmit at a rate bounded away from zero, and 
yet achieve approximately perfect secrecy, i.e., AttH(Si). Before 
giving the answer to this question, we shall describe the more general 
problem that is addressed in the sequel. 

Refer to Fig. 2. The source is discrete and memoryless with entropy 
H s - The "main channel" and the "wire-tap channel" are discrete 
memoryless channels with transition probabilities Qm{-\-) and 
Qw(- I •)> respectively. The source and the transition probabilities Qm 
and Qw are given and fixed. The encoder, as in the above example, is a 
channel with the K vector S K as input and the N vector X^ as output. 
The vector X^ is in turn the input to the main channel. The main 
channel output and the wire-tap channel input is Y^. The wire-tap 
channel output is Z N . The decoder associates a K vector S K with Y N , 
and the error probability P e is given by (1). The equivocation A is 
given by (2), and the transmission rate is KH S /N source bits per 
channel input symbol. Roughly speaking, a pair (R, d) is achievable 
if it is possible to find an encoder-decoder with arbitrarily small P e , 
and KHs/N about R, and A about d (with perhaps N and K very 
large). Our main problem is the characterization of the family of 
achievable (R, d) pairs, and such a characterization is given in Theorem 
2. It turns out (Theorem 3) that, in nearly every case, there exists a 
"secrecy capacity," C > 0, such that (C„ H s ) is achievable [while, 
for R > C„ (R, H s ) is not achievable]. Thus, it is possible to reliably 
transmit information at the positive rate C s in essentially perfect 
secrecy. 

For the special case of our introductory example (Hs = 1, Qm 
corresponding to a noiseless channel and Qw to a bsc), the conclusion 
of Theorem 2 specializes to the assertion that (R, d) is achievable if 
and only if ^ R ^ 1, ^ d ^ 1, and Rd ^ h(p ). Note that scheme 
(i) suggested above for this special case asserts that R = I, d = h(po) 
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Fig. 2 — Wire-tap channel (general case). 
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is achievable. From Theorem 2, this value of d = h (p ) is the maxi- 
mum achievable d, if R = 1. Scheme (ii) above asserts that R = 0, 
rf = 1 is achievable, but this is distinctly suboptimal since from 
Theorem 2, R = h(p ), d = 1 is achievable. Thus, reliable trans- 
mission at a rate h(p ) is possible with perfect secrecy, and C„ = h(p ). 
An outline of the remainder of this paper now follows. In Section 
II, we give a formal statement of the problem and state the main 
results (Theorems 2 and 3). In Section III we give a proof of Theorem 
2 for the special case discussed above (main channel noiseless, wire-tap 
channel a bsc). In Section IV, we prove the converse half of Theorem 2, 
and in Section V the direct half of that theorem. 

II. FORMAL STATEMENT OF THE PROBLEM AND SUMMARY OF RESULTS 

In this section we give a precise statement of the problem that we 
stated informally in Section I. We then summarize our results. 

First, a word about notation. Let <U be an arbitrary finite set. Denote 
its cardinality by 1 0L | _ Consider 1L- V , the set of N vectors with com- 
ponents in <U. The members of 01 - v will be written as 

U V = (Ui, M 2 , ••, Mat), 

where subscripted letters denote the components and boldface super- 
scripted letters denote vectors. A similar convention applies to random 
vectors and random variables, which are denoted by upper-case letters. 
When the dimension N of a vector is clear from the context, we omit 
the superscript. 

For random variables X, Y, Z, etc., the notation H(X), H{X\ F), 
I(X; Y), I(X; Y\Z), etc., denotes the standard information quantities 
as defined in Gallager. 1 The logarithms in these quantities are, as are 
all logarithms in this paper, taken to the base 2. Finally, for n = 3, 4, 
5, •••, we say that the sequence of random variables {X<}?«, is a 
"Markov chain" if (I,, X,, ■ ■ ■, Xy_i) and (X i+1 , • • -, X») are condi- 
tionally independent, given Xj(l < j < n). We make repeated use of 
the fact that, if X h X- h X 3 is a Markov chain, then 

H(X 3 \X h X 2 ) = H(X 3 \X 2 ). (4) 

At this point we call attention to Appendix A, in which the data- 
processing theorem and Fano's inequality are given in several forms. 
We now turn to the description of the communication system. We 
assume that the system designer is given a source and two channels 
that are defined as follows. 

(i) The source is defined by the sequence [S k }T, where the S k are 
independent, identically distributed random variables that take 

WIRE-TAP CHANNEL 1359 



values in the finite set S. We assume that the probability law that 
defines the [S k ] is known. Let the entropy H(S k ) = H s . In Appendix 
C we show how to extend the results of this paper to arbitrary station- 
ary finite alphabet ergodic sources. 

(ii) The main channel is a discrete memory less channel with finite 
input alphabet 9C, finite output alphabet % and transition probability 
QM(y\x), x E 9C, y E *y. Since the channel is memoryless, the transi- 
tion probability for JV vectors is 



N 

D 

n=l 



QtiP(j\x) = UQM(y n \x n ). (5) 



Denote the channel capacity of the main channel by Cm- 

(Hi) The wire-tap channel is also a discrete memoryless channel 
with input alphabet % finite output alphabet 3, and transition 
probability Qw(z\y), y E % z E 3- The cascade of the main channel 
and the wire-tap channel is another memoryless channel with transition 
probability 

Qmw{z\x) = £ Qw{z\y)Q M {y\x). (6) 



i/G-y 



Occasionally, when there is no ambiguity, we use the transition proba- 
bility of a channel to denote the channel itself. Let C M w be the capacity 
of channel Qmw- 

With the source statistics and channels Q M and Qw given, the 
designer must specify an encoder and a decoder, defined as follows. 

(iv) The encoder with parameters (K, N) is another channel with 
input alphabet S K , output alphabet °C N , and transition proba- 
bility gjj(x|s), s E S x , x E X N - When the K source variables 
S* = (Si, • • • , S K ) are the input to the encoder, the output is the 
random vector X*. Let Y* and Z N be the output of channels Q$° and 
Qfflr, respectively, when the input is X". The equivocation of the 
source at the output of the wire-tap channel (corresponding to a 
particular encoder) is 

A = 4#(S*|Z"). (7) 

We take A as our criterion of the wire-tapper's confusion. From the 
system designer's point of view, it is, of course, desirable to make A 
large. 

(v) The decoder is a mapping 

f D :y N ^S K . (8a) 

Let S = (Si, • • •, S K ) = /d(Y). Corresponding to a given encoder and 
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decoder, the error-rate is 



P. = if Pr {&*&}. 
ft- * = i 



(8b) 



We refer to the above as an encoder-decoder (K, N, A, P e ).' The 
applicability of the above to the system in Fig. 2 should be obvious. 

Next, we say that the pair (R, d) (where R, d > 0) is achievable if, 
for all e > 0, there exists an encoder-decoder (N, K, A, P e ) for which 



(H S K) 



> R - 



N ~ 
A ^ d - 
P, < e. 



(9a) 

(9b) 
(9c) 



Our problem is to characterize the set (R of achievable (R, d) pairs. 
Let us remark here that it follows immediately from the definition 
that (R is a closed subset of the first quadrant of the (R, d) plane. 
Before stating our characterization of (R, we digress to discuss a certain 
information-theoretic quantity that plays a crucial role in our solution. 
Consider the channels Q M , Qw, and Q MW defined above. Let p x (x), 
x G SC, be a probability mass function and let X be the random 
variable defined by 

Pr {X = x\ = px(x), x e 3C. 

Let Y, Z be the outputs of channels Q M and Qmw, respectively, when 
X is the input. For R ^ 0, let <P(R) be the set of p x such that 
I(X; Y) ^ R. Of course, <P(R) is empty for R > C M , the capacity of 
channel Q M . Finally, for ^ R ^ Cm, define 



T(R) = sup I(X; Y\Z). 
Px&P(R) 



(10) 



We remark here that, for any distribution p x on EC, the corresponding 
X, Y, Z forms a Markov chain, so that the definition of mutual infor- 
mation and (4) j yield 

I(X; Y\Z) = H(X\Z) - H(X\Y,Z) 

= H(X\Z) -H(X\Y) = I{X;Y) - I(X;Z). (11) 

Thus, we can write (10) as 

T(R) = sup I(X; Y\Z) = sup [/(Z; Y) - I(X;Z)J (12) 

Px©P(ff) PxG(P(fl) 



* This should be read as ". . . an encoder-decoder with parameters (K, N, A, P t ).' 
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As an example, suppose that 9C = <y = 3 = {0, 1}. Let Q M be a 
noiseless (binary) channel, and let Qw be a binary symmetric channel 
(bsc) with crossover probability p - Then for arbitrary p x , 

I(X; Y) - I(X;Z) = H(X) - \_H{Z) - H{Z\X)~] 

= h(p ) + H(X) - fl-(Z) ^ Mpo), 

where &(•) is denned in (3). The inequality follows from the well- 
known fact (see, for example, Ref. 2) that the entropy of the output 
of a bsc, i.e., H{Z), is not less than the entropy of the input, H(X). 
Further, H(X) = H(Z) if and only if p x (0) = p x (l) = \. Since this 
distribution belongs to (P(R), for all R, ^ R £ Cm - 1, we conclude 
that, in this case, 

T(R) = h(p ), O^R^Cm- (13) 

In Appendix B, we establish the following lemma concerning T(R). 

Lemma 1: The quantity T(R),0 ^ R ^ C M , satisfies the following : 

(i) The "supremum" in the definition of T[(10) or {12)~] is, in fact, 
a maximum — i.e., for each R, there exists a px G ®(R) such 
thatI(X;Y\Z) = F(R). 
(ii) F(R) is a concave function of R. 
(Hi) T(R) is nonincreasing in R. 
(iv) T(R) is continuous in R. 

(v) C M ^ T(R) ^ Cm - Cmw, where C M and C M w are the capaci- 
ties of channels Q M and Qmw, respectively. 

We can now state our main result, the proof of which is given in the 
remaining sections. 

Theorem 2: The set (R, as defined above, is equal to (R, where 

& = { (R, d) : £ R g Cm, O^d^ H s , Rd ^ H S T(R)}. (14) 
Remarks: 

(1) A sketch of a typical region (R is given in Fig. 3. In the above ex- 
ample (Qm noiseless and Q w a bsc), T(R) = h(p ), a constant, so that 
the curve Rd = H S T(R) is a hyperbola. Observe that in this case 
the region (R is not convex. This is in contrast to the up-to-now essen- 
tially universal situation in multiple-user Shannon theory problems, 
where the solution is nearly always a convex region. Whether or not 
T(it!)/.R is always convex, as it appears in Fig. 3, is an open question. 

(2) The points in (R for which R = C M correspond to data rates of 
about the capacity of Q M - This is clearly the maximum rate at which 
reliable transmission over Q M is possible. An equivocation at the 
wire-tap of about H s T(Cm)/C m is achievable at this rate. An increase 
in equivocation requires a reduction of transmission rate. 
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Fig. 3 — Region (R. 

(3) The points in (R for which d = H s are of considerable interest. 
These correspond to an equivocation for the wire-tapper of about 
Hs — i.e., perfect secrecy. A transmission rate of 

C s = max R 

is therefore achievable in perfect secrecy. We call C t the "secrecy 
capacity" of the channel pair (Q M , Qw). The following theorem 
clarifies this remark. 

Theorem 3: If Cm > Cmw, there exists a unique solution C s of 

C s = r(C.). (15) 

Further, C s satisfies 

< Cm - Cmw ^ V(C M ) ^ C a ;g Cm, (16) 

and C s is the maximum R such that (R, Hs) £ (R. 

Proof: Define G(R) = T(R) - R, ^ R ^ CV From Lemma 1 (»), 

(Cm) = T(C M ) -Cm SO, 
and 

G(0) = r(0) ^ Cm - Cmw > 0. 
Since by Lemma 1, (iii) and (iv), G(R) is continuous and strictly 
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decreasing in R, a unique C 8 G (0, C M ~\ exists such that G(C) 
= T(C 8 ) — Cs = 0. This is the unique solution to (15). Inequality 
(16) follows from C, G (0, Cm] and Lemma 1, (in) and (v). Finally, 
from (15) and (16) we have (C„ H 8 ) G 01 = (R- Also, if (Ri, H s ) G (R, 
then HsRi ^ H S T(R X ) so that G(22i) ^ 0. Since G(JB) is strictly 
decreasing in R, we conclude that Ri ^ C„. Thus, C. is the maximum 
of those R for which (22 1, 2?s) G (R, completing the proof. 

(4) It is clear that the source statistics enter into the solution only 
via the source entropy Hs- We also remind the reader that the fairly 
simple extension of Theorems 2 and 3 to a stationary, ergodic source 
is given in Appendix C. 

(5) If we define P^, the "wire-tapper's" error probability, as the 
error rate at a decoder built by the wire-tapper [defined analogously 
to (8)], then it follows from Fano's inequality (see Appendix A) that 

A ^ h(P ew ) -r-Peu-log |S|. 

Thus, a large value of the equivocation A implies a large value of 
P ew (which the system designer will find desirable). 

III. PROOF OF THEOREM 2 FOR A SPECIAL CASE 

In this section we prove Theorem 2 for the very special case dis- 
cussed in Section I. All alphabets S, 9C, *y, 3 are equal to {0, 1}. The 
source {&} satisfies Pr [S k - 0} = Pr {S k = 1} - h Channel Q M is 
noiseless, i.e., Qm(v\x) = S x , v ; and channel Qw is a bsc with crossover 
probability p (0 ^ p ^ £), i.e., 

Qw(z\y) = (1 — Po)8y, z + p (l - Sy, z ). (17) 

We show here that (R, d) is achievable if and only if 

22 ^ Cm = 1, d^H s = l, Rd ^ h(p ). (18) 

Since, for this case, T(R) = h(p ), this result is a special case of the 
as-yet-unproven Theorem 2. We begin with the converse ("only if") 
part of the result. Let S K , X N , Z N correspond to an encoder-decoder 
(JV, K, A, P e ) (note that Y* = X*). Then, making repeated use of 
the identity H(U,V) = H(U) + H(V\U), we can write (dropping 
the superscript on vectors) 

KA = H (S K \Z N ) = H(S, Z) - H(Z) 
= H(S, X, Z) - #(X|S, Z) - H(Z) 
= H(Z\X,S) +H(X, S) -H(X\S,Z) - H{Z) 

= #(Z|X) +#(S|X) +#(X) - #(X|S,Z) -H(Z) 

(b) 

= Nh( P o) + H(S\X) + [^(X) - H(Z)2 - H(X\S, Z). (19) 
These steps are justified as follows. 
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(a) From the fact that (S, X, Z) is a Markov chain and (4), so that 
#(Z|X, S) = H(Z\X). 

(b) Since X, Z are the input and output, respectively, of a bsc, 
H(Z\X) = Nh(p ), regardless of the distribution for X. 

Now from Fano's inequality [use ineq. (78) with V = X], we have 
#(S|X) ^ Kh(P e ). Further, the entropy of the output of a bsc ^ the 
entropy of the input [this follows from Mrs. Gerber's lemma (Ref. 2, 
Theorem 1)], so that H(X) - H{Z) ^ 0. Finally, #(X|S, Z) £ 0. 
Thus, (19) yields for any encoder-decoder (K, N, A, P e ), 

KA ^ Nh( Po ) + Kh(P e ), 

or 

| [A - h{P e )-] ^ h(p ). (20) 

Now suppose that (R, d) is achievable. It follows from the ordinary 
converse to the coding theorem (Ref. 1, Th. 4.3.4, p. 81) that 
R ^ C M = 1. Further, since A ^ H s = 1, we conclude that d ^ 1. 
Finally, if we apply (20) to an encoder-decoder (N, K, A, P e ) that 
satisfies (9) with e > arbitrary, we have 

(R - e)[(f/ - e) - h(e)2 ^ A(p ). 

Letting e — > yields Rd ^ h(p ). Thus, we have established the 
converse of Theorem 2, i.e., that an achievable (R, d) must satisfy (18). 
We begin the proof of the direct half of Theorem 2 with a digression 
about group codes for the bsc. Let G C {0, l) N be a group code (i.e., 
a parity check code) as defined for example in Ref. 1, Chapter 6, or 
Ref. 3, Chapter 4. The group code G has M = 2^/1^1 cosets. Denote 
the cosets by C = G, C h C 2 , • • • , CV-i- Of course, the cosets are 
disjoint and 

U c,= {o,i}". 

»■— 

Let X be the word error probability when group code G (or for any of 
the cosets) is used on a bsc with crossover probability p , with maxi- 
mum-likelihood (minimum distance) decoding. Thus, for each coset 
Ci, 5£ * 5j M — 1, there exists a decoder mapping Dii {0, 1}" — > C„ 
such that if X v is the input to a bsc with crossover probability p , and 
Z v is the corresponding output, then for all x £ C„ ^ i :£ M — 1, 

Pr \Di{Z N ) ^ X- V |X- V = x} = X. 

Thus, regardless of the probability distribution for X jV , 

Pr \Di(Z N ) ^X- V |X- V GC,| = X. 
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Letting ^(x) = i, for x £ C„ ^ i ^ M — 1, we have, from Fano's 
inequality [use ineq. (76) with U = X", V = Z N , U = Di(Z N )~\, 

H{JL N \Z N , <A = i) ^ h(\) + X log \d\. 

Therefore, for any X distribution (which induces a distribution of \f/), 

H(X N \Z N , +) ^ h(\) + Xlog \G\. (21) 

We conclude this digression by stating as a lemma the well-known 
result of Elias that there exists a group code for transmitting reliably 
over a bsc at any rate up to capacity. A proof of this result can be 
found in Ref. 1, Section 6.2. 

Lemma 4: Let a > 0, r < 1 — h(p ) be arbitrary. Then, provided N is 
sufficiently large, there exists a group code G of block length N with 
\G\ ^ 2 Nr , such that, on the bsc ivith crossover probability po, the error 
probability X ^ ei. 

We now prove the direct half of Theorem 2 for our special case by 
showing that any (R, d), where R is rational, which satisfies 

R-d = h(p ), (22a) 

^ d < 1, (22b) 

OgR^l (22c) 

is achievable. Thus, for (R, d) satisfying (22), and arbitrary e > 0, 
we must show the existence of an encoder-decoder (N, K, A, P e ) that 
satisfies (9). We now proceed to this task. 
Let K, N satisfy 

f = R. (23) 

Let G be a binary group code with block length N and with |G| 
= 2 <"-*>. Thus, G has M = 2 K cosets |C,}fL . We can assume that 
the set S* = {0, 1}* is the set of integers {0, 1, • • -, Af — 1}. We 
construct the encoder such that when the source vector S K = i,* the 
encoder output X^ is a randomly chosen member of coset d — i.e., 

= tttt = 2-< N - K \ for x G d, 

x $ d, 

0£i£M—l. Since S* is uniformly distributed on {0, 1, • • • , M - 1 } , 
X^ is uniformly distributed on 2C N = {0,1}^. Thus, in particular, 

H(X N ) = H(Z N ) = N, (24) 




* This is an abuse of notation. A more precise statement is that S K is a binary 
representation of i. 
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where, as always, Z v is the output of the wire-tap channel when X v 
is the input. Also let us observe here that the quantity ^(X A ), defined 
in the above digression, is identical to S K . Thus, (21) yields 

H(X»\Z N , S K ) ^ MX) + X(iV - K), (25) 

where X is the error probability for the group code G. 

We now turn to the decoder. Letting D(y) = i, when y £ C„ we 
conclude (since the channel Qm is noiseless) that 

P e = 0. (26) 

Since (23) and (26) imply (9a) and (9c), it remains to show that a G 
exists such that the resulting encoder-decoder will satisfy (9b). 

We now invoke (19), which is valid for any encoder-decoder. 
Substituting (24) and (25) into (19), and invoking (26), which implies 
#(S|X) = 0, we obtain 

**(f)*w-*-P-xg-i). 

Now, from (22a) and (23), we have 



and from (23), 



Thus, (27) yields 



-ghiPo) = — ^— = a, 



>GH->(i-0 



A > d - 



^ + >(*-!)]■ (23) 



Finally, since from (23) and (22a) we have 

\G\ = 2 N ~ K ^ 2 A^ ' 1 - /l( p o)/ ' , ) 

we can invoke Lemma 4 with r = 1 — h(p Q )/d < 1 — k(p ) [from 
(22b) 3 to assert the existence of a group code G with X sufficiently 
small to make the term in brackets in (28) ^e. Then A ^ d — e, 
which is (9b). This completes the proof of the direct half. 

IV. CONVERSE THEOREM 

In this section, we establish the converse theorem that the family 
of achievable rates (R is contained in (R as defined in (14). Suppose that 
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(R, d) £ (R. That R ^ Cm follows from the ordinary converse to the 
coding theorem (Ref. 1, Theorem 4.3.4, p. 81). That d g H s follows 
from 

A = ~H(S K \Z N ) :g ^H(S K ) = H s . 

Thus, it remains to show that Rd ^ H S T(R). We do this via a lemma, 
the proof of which is given at the conclusion of this section. 

Lemma 5: Let S K , X A ', Y v , Z v correspond to an encoder-decoder 
(AT, K, A, P e ). Then 

(i) ^ [A - S(P.)] :g ^ £ I(X n ; Y n \Z n> Y"" 1 ), (29a) 

(m) ^ [# S - «(P.)] ^ ^ b £ /(I n ; F„ | Y- 1 ), (29b) 

w/iere 

5(P e ) = A (P.) + P. log |*|, (29c) 

and where the n = 1 term m Me summations of {29a, b) is given the 
obvious interpretation — i.e., that I{Xi; Y\\Z\, Y°) = I{X\\ Yi\Zi), etc. 
Now for n = 2, 3, • • • , N, any y £ y*- 1 , set 

« n (y) = /(Z„;F n |Y«- 1 = y). (30a) 

Also let 

*i = /(Zij F x ). (30b) 

It follows from the definition of (P(R) in Section II that the distribution 
pi, defined by 

pi(z) = Pr (Zx = x\, xG 9C, 

belongs to (P(ai). Similarly, for 2 ^ n ^ iV, with y £ I y n_1 fixed, define 

p„, y (aO = Pr {X„ = x\Y"~ l = y], xE 9C. 

Then p„, y £ (P[« n (y)]- Thus, from (10) and the fact that channels 
Q^p and Q\P are memoryless, 

r(«i) ^ /(XijFil^i), (31a) 

and for 2 g n ^ JV, y £ t y n -\ 

r[«n(y)] ^ /(I,; F n |Z„, Y*-» = y). (31b) 

It follows that the right member of (29a) is (giving the n = 1 term 
the obvious interpretation) 
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4 Z/(X„;r„|z, ( ,Y»-i) 



= ^E £ Pr{Y«- 1 = y}J(X n ;r„|Z n ,Y«-i-y) 

^V n-l yG^"-' 

S ^ L L Pr {Y»-' = y}r[a„(y)] (32) 

^ rj^LEPrlY-^ y} a „(y) 

(c) / 1 \ 

= r(I ? /(X.F.|T-)) 

Step (a) follows from (31), step (b) from the concavity of T [Lemma 
1(u)3j step (c) from the definition of a„, and step (d) from (29b) and 
the monotonicity of T [Lemma l(m)J. Applying (29a) to (32) yields 

Corollary 6: For any encoder-decoder (N, K, A, P e ), 



|[A-«(p a )]^r 



~H s -8(P e )^- (33) 



N 



We now show that, if (R, d) C &, then ita ^ H S T(R). Let 
(.R, d) G (H, and let e > be arbitrary. Apply Corollary 6 to the 
encoder-decoder (N, K, A, P e ) that satisfies (9). Inequalities (33) and 
(9) yield 

(R - e)C(d - e) - 5(e)] ^ H S TZ(R - e) - 5(e)]. (34) 

Letting e — » and invoking the continuity of r [Lemma l(w)] yield 
Rd ^ H s r(R), completing the proof of the converse. It remains to 
prove Lemma 5. 

Proof of Lemma 5: 

(i) Let S A ', X v , Y v , Z v correspond to an encoder-decoder (A, K, A, P e ). 
First observe that 

^H(S K \Z", Y N ) ^ ^H(8 K \Y N ) 

A A 

{ Sh(P e ) + P e log (|S| - 1) = 8(P e ). (35) 

Inequality (a) follows from Fano's inequality [use (78) with V = Y jV ]. 
Next, using the definition of A (7) and (35), write 

KA = #(S*|Z- V ) ^ H($x\Z") - H(S K \Z", Y N ) + K8(P.) 
= I(S K ;Y"\Z») + K6(P e ) 
^ I (X K ; Y" | Z N ) + A'5 (P„) . (36) 
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The last inequality in (36) follows from the data-processing theorem, 
since given Z N = z, (Y N , X' v , S A ') is a Markov chain (Appendix A). 
Transposing the K8(P e ) term in (36) and continuing: 

K[A - «(P.)] ^ I(X N ; Y"|Z") 
= H{X N \Z N ) - H(X N \Z N , Y N ) 

= H(X N \Z N ) - H(X N \Y N ) 

= I(X N ;Y N ) - I(X N ;Z N ) 

= H(Y N ) - H{Z N ) + H{Z N \X N ) - H{Y N \X N ) 

= £ [^(F„|Y"- 1 ) - H(Z n \Z»- 1 ) 

n— 1 



k £ LH(Y n \Y n - 1 ) - H{Z n \Z n -\Y n ~ l ) 

n-l 



= £ LH(Y n \Y"- 1 ) - H(Zn\Y*- 1 ) + H(Z n \X„, Y"" 1 ) 

n-l 



+ H(Z n \X n ) -H(Y n \Xn)] 

\ Y- 1 ) 

+ H(Z n \Xn) -H(Yn\X»)] 

:z«|x n ,Y"- 1 ) 

+ H(Fn|X»,Y»- 1 )] 



= £ [J(X., F„|Y«- 1 ) - I(X„;Z n \Y^n 

n-l 

= £ [#(X„|Z n , Y"- 1 ) - ff(X.|y.,Y"^)] 

n-l 

= £ [^(X„|Z n , Y- 1 ) - ff(Z,| F„, Z„, Y»-')] 

n-l 

= £ /(ZnjFnlZ^Y"- 1 ). (37) 

n-l 

The steps in (37) that require explanation are : 

(a) that follows from the fact that X", Y* Z N is a Markov chain 
and (4) ; 

(b) that follows from the standard identity 

H(U») = LflWnlU- 1 ), 

n=l 

and the fact that channels Q$° and Qfflp are memoryless ; 

(c) that follows from the fact that conditioning decreases entropy ; 

(d) that follows on applying (4) to the Markov chains (Z n ~\ Y*" 1 , 
Z n ), (Y- 1 , Xn, Y n ,Z n ); 

1370 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1975 



(e) that follows from the fact that, given Y" _1 , (X», Y„, Z„) is a 
Markov chain. 

Since (37) is (29a), we have established part (i) of Lemma 5. 

iii) With S* X A ', Y v , Z N , as in part (i) write 

H(S K ) = I(S K ; Y- v ) + H(S K \Y N ) 

^ /(X-V;Y-v) +K8(P e ), (38) 

where the inequality follows from the data-processing theorem (since 
S K , X' v , Y^, is a Markov chain) and from Fano's inequality as in (35). 
Since H(S K ) = KH S , (38) yields 

K[H a - 5(P.)] ^ /(X";Y*) 

= E lH(Y n \Y"~ l ) - H(Y n \X n )-\ 

n=l 

= Z LHiYnlY"- 1 ) - H{Y„\X n , Y- 1 )] 

n=l 

= £ /(XnjFnlY"" 1 ). (39) 

n=l 

Step (a) follows on application of H(Y N ) = "£„ Z/(F n |Y"-'), and the 
memorylessness of channel Qm\ and step (b) from the fact that 
Y" _1 , X„, Y n is a Markov chain. Inequality (39) is (29b), so that the 
proof of Lemma 5 is complete. 

V. DIRECT HALF OF THEOREM 2 

In this section we establish the direct (existence) part of Theorem 2, 
that is, (R Q (R. The first step is to establish two lemmas that are 
valid for any encoder-decoder as denned in Section II. 

Lemma 7: Let S K , X- v , Y' v , Z- v correspond to an arbitrary encoder-decoder 
(N, K, A, P e ). Then 

KA = #(S*|Z- V ) = H(S K ) + 7(X- V ; Z"|S*) - J(X* ; Z"). (40) 

Proof: By repeatedly using the identity H(U, V) = H(U) + H{V\U), 
we obtain (we have omitted superscripts) 

KA = H(S\Z) = H(S, Z) - H(Z) 

= H(S,Z,X) -H(X\S,Z) -H(Z) 

= #(Z|X, S) + H(X, S) - H{X\S, Z) - H(Z) 

= H(Z\X, S) + 7/(S) + IH(X\S) - tf(X|S, Z)] - H(Z) 

= ff(S) +/(X;Z|S) - [ff(Z) -#(Z|X,S)]. (41) 
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Now, since S, X, Z is a Markov chain, #(Z|X, S) = H(Z\X) [by 
(4)]. Thus, the term in brackets in the right member of (41) is /(X ; Z), 
completing the proof. 

We now give some preliminaries for the second of the two lemmas. 
For the remainder of this section we take the finite set 9C to be 
{1, 2, • • ■, A). Let -X"* be a random variable that takes values in 9C 
with probability distribution 

p r {X* = i] = p* x (i), l£i£A. 

Let Y* and Z* be the output of channels Q M , and Q M w, respectively, 
when X* is the input. As always, Quw is the cascade of Qm and Qw, 
so that X*, Y*, Z* is a Markov chain. Next, for 1 ^ i ! £ A, and 
x G X N define 

# (i, x) = card [n: x„ = i) 

= number of occurrences of the symbol i in the 

iV-vector x. (42) 

For N = 1, 2, • • •, define the set of "typical" X sequences as the set 

#fcx) 



T* = T*(N) = 



x G 9C- V : 



- Px(») 



^ «AT, 1 ^ » ^ A 



AT 

(43a) 

where 

5^ = N~*. (43b) 

Let us remark in passing that the random JV-vector X*" consisting of 
N independent copies of X* satisfies E# (i, X* N ) = Np* x {i), and 
Var C# (i, X**)] = Np* x (i)[l - p* x (i)l, for 1 £ t £ A. Thus, by 
Chebyshev's inequality 

Pr {X*" £ r*(JV)} ^ E Pr [|#(i,X*) - iYp x (t)| > N8„] 

^ £ Var [# (i, X*)l/m% = o( -L ) _> o, (44) 

as iV — > °° . 

We can now state the second of our lemmas. We give the proof at 
the conclusion of this section. 

Lemma 8: Let X N , Z N correspond to an arbitrary encoder and let X*, Z*, 
T* correspond to an arbitrary p* x as above. Then 

I/(X";Z") ^ I(X*, Z*) + (log A) Pr {X" £ T*(N)} + h(N), 

where fi(N) — ► 0, as N — > » . 
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Lemma 8 implies that, if the encoder is such that with high proba- 
bility X A ' £ T*, then (1/.V)/(X- V ; Z v ) cannot be much more than 
I(X*, Z*). 

Lemmas 7 and 8 hold for any encoder-decoder. Our next step is to 
describe a certain ad-hoc encoder-decoder and deduce several of its 
properties. We then show that when the parameters of the ad-hoc 
scheme are properly chosen, the direct half of Theorem 2 will follow 
easily. 

We begin the discussion of the ad-hoc scheme by reviewing some 
facts about source coding. With the source given as in Section II, 
for K ■■ 1, 2, •■•, there exists a ("source encoder") mapping Fe: 
& K -> {1, 2, • • •, M\, where 

M = 2 kh su+sk) } (45) 

and 5 K = K~K Let F D : {1, 2, • • •, M) — > S* be a ("source decoder") 
mapping, and let 

P<f> = Pr \F D oF E (S K ) * S*} 

be the resulting error probability. It is very well known that there 
exists (for each K) a pair (Fe, Fd) such that, as K — > oo , 



where 



P f ( f = Pr \F D (W) * S*} ->0, 



W = 7MS K ). 



(46a) 



(46b) 



We will design our system to transmit W using an (Fe, Fd) that 
satisfies (46). 

We now turn to our ad-hoc system. (Refer to Fig. 4.) The source 
output is the vector S A ', and the output of the source decoder is 
W = F E (S K ). Let 



qi = Pr [W = F E (S K ) = i), 1 gi ^ M. 



(47) 



SOURCE 
ENCODER 



W=F E IS] 



CHANNEL 
ENCODER 



X N 



CHANNEL V N 



Om 



SOURCE 
DECODER 



SK = F D |W> 



Fig. 4 — Ad-hoc encoder-decoder. 
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Next, let Mi = M 2 M be a multiple of M to be specified later. Let 

be a subset of 9C V . Clearly, {x m } can be viewed as a channel code for 
channel Qffl or channel Qm\v- The channel encoder and decoder in 
Fig. 4 work as follows. The channel encoder and decoder each contains 
a partition of (Xmjf 1 into M subcodes Ci, Co, •••, Cm, each with 
cardinality M 2 . Assume that 

Ci = {x (1 _ 1)M2+1 ,- • •, x iMi }, l£i£M. (48) 

When the random variable W = i, then the channel encoder output 
X^ is a (uniformly) randomly chosen member of the subcode C Thus, 
for 1 £ i £ -Af, 1 £ j £ M 2 , 

Pr [V = x (f _ 1)Af , +i | W = i) = ^ , (49a) 

and 

Pr {X'V = x (l , 1)J/2+j ) - ^-- (49b) 

Now the set {x m }i fl can be thought of as a channel code for channel 
QaP with prior probability distribution on the code words given by 
(49b). A decoder for the code is a mapping G: ^ N — > {x m }i /l and the 
(word) error probability is 

X = Pr {G(Y N ) ^ X"}, (50) 

where Y- v is the output of QjfP, when the input X v has distribution 
given by (49b). We assume that the channel decoder in Fig. 4 has 
stored the mapping G. When the channel output is y £ y, the channel 
decoder computes G'(y). When 6'(y) £ C, the channel decoder output 
is i, 1 ^ i ^ M . Letting W be the output of the channel decoder, 
we have 

Pr {W ^ W) ^ X. 

The final step in the system of Fig. 4 is the emission by the source 
decoder of S K = F D ($T), where F D : {1, 2, • • •, M) ->S* is chosen so 
that (46) holds. We have 

Pr {S = S} = Pr {S = F D {W)) 

^ Pr {S = F D (W);W = #}. 

Thus, 

P e ^ Pr {S ^ S} ^ Pr {S * Fd(W)} 

+ Pr {W ^ W\ ^ Pjf> + A. (51) 
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Next, let us observe that each of the subcodes C\ can be considered 
a code for channel Qmw with M» code words and uniform prior distri- 
bution on the code words. Let A, be the resulting (word) error proba- 
bility for code d (1 £ i ^ M) with an optimal decoder, and let 

X = £ qi\i. (52) 

We now establish 

Lemma 9: For the ad-hoc encoder-decoder defined above 

7(X- V ;Z V |S A ') ^ log il/ 2 - \_h{\) + XlogM 2 ]. 

Proo/: Let S A ' be such that W = F E {S K ) = i. Then the channel 
input X* v given W = i has distribution given by (49a), i.e., X' v is a 
randomly chosen member of C\. Since X, is the error probability for 
code d used on channel Qm\v, Fano's inequality [use (76) with U = X N , 
V = Z N , U = the decoded version of Z N when code C, is used] yields 

H(X N \Z», W = i) ^ h{\i) + X,logM 2 , 

and, since H(X N \ W = i) = log M 2 , we have 

I(X N ; Z' v | W = i) ^ log Mi - h(\i) - \i log M 2 . 

Averaging over i using the weighting {<?,}, and using the concavity 
of h(-), we have 

I(X N ; Z N \W) ^ log Mi - O(X) + X log M 3 ]. (53) 

Finally, since S, W, X, Z is a Markov chain, (4) yields 

7(X' V ;Z- V |JF) = H(Z\W) - H(Z\XW) 
= H(Z\W, S) - H(Z\X) 
= H(Z\W, S) - tf(Z[X, S) 
^ H(Z\S) - H(Z\X, S) = J(X";Z A '|S). (54) 

Inequalities (53) and (54) imply Lemma 9. 

We are now ready to combine the above lemmas as : 

Corollary 10: Let p* x be an arbitrary -probability distribution on 9C, and 
let T* X (N), X*, Y*, Z* be as defined above (corresponding to p* x ) . Assume 
that S K , X v , Y v , Z v correspond to the above ad-hoc encoder-decoder with 
parameters N, K, M, M h M 2 , X, X. Let P e and A correspond to this 
ad-hoc scheme. Then 

P e ^ PiT + X (55a) 
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and 

- (log A) Pr (X" £ nW} - AW, (55b) 
where f\ (N) —* as N — > w . 

Proof: Inequality (55a) is the same as (51). Inequality (55b) is ob- 
tained by substituting the results of Lemmas 8 and 9 into (40) and 
using H(S K ) = KH S . 

Finally, we are ready to prove the direct half of Theorem 2. We do 
this by showing that any pair (R, d), which satisfies 

Rd = H S T(R), (56a) 

^ R ^ Cm, (56b) 

^ d g H s , (56c) 

is achievable. Thus, for (R, d) satisfying (56) and for arbitrary e > 0, 
we show that our ad-hoc scheme with appropriately chosen parameters 
satisfies (9). To begin with, choose K, N to satisfy 

(Assume that R/H s is rational.) Note that (57) implies (9a). Also, let 
p* x be a distribution on 9C that belongs to (P(R) and achieves T(R) — 
that is, 

I(X*; Y*) ^ R, 

I(X*; Y*) - I(X*;Z*) = I(X*; Y*\Z*) = T(B), (58) 

where X*, Y*, Z* correspond to p* x . We now assume that an encoder- 
decoder is constructed according to the above ad-hoc scheme with 
the parameter* 



My= exp 2 jiv[/(X*;F*) - ^jl , 



(59) 



where X*, Y* correspond to the above choice of p* x . With this choice 
of M h and with M given by (45), we have 



R 



M2 = W = exp2 [ N [ I{X *' Y * } ~ § Hs ~ W Hs8k ~ 2/1 
Note that, from (57), 



(60) 



* Assume that the right member of (59) is an integer. If not, a trivial modification 
of the sequel is necessary. 
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(a) eR 

= I(X*;Y*)-R-R8 K - 7 £- 

- HX ' Y } - ldjHs)~ ~ R8k ~2Hs 

< b) «=7? 

^ I(X*; Y*) - T(R) - R8 K - 



2H S 
= I(X*; Y*) - I(X*; Y*\Z*) - R8 K - 



2H, 



= I(X*;Z*)-R8 K -~ (61) 

Step (a) follows from (57), step (b) from (56a) and (56c), and step (c) 
from the fact that X*, Y*, Z* is a Markov chain — see (11). 

Let us now apply Corollary 10 to the ad-hoc scheme with the above 
choice of M h M->, and with the above choice of p* x . Inequality (55a) 
remains 

P e ^ P ( e P + X, (62) 

and substituting (60) into (55b) yields 

(RA)/H S ^ I(X*; Y*) - I(X*;Z*) - f 2 (N) 

= F(R) - f 2 (N), (63a) 

where 

+ (log A) Pr (X" £ T*(N)\ + MN). (63b) 

Now observe f 2 (N) and X depend on the choice of the set {x m }f\ 
The following lemma asserts the existence of a {x m } such that these 
quantities are small. Its proof is given at the end of this section. 

Lemma 11: With p* x and M h M* as given above, there exists for arbitrary 
N a set 






such that 



Pr {X* $ T*(N)},] 

X, [ ^ f 3 (N), (64) 

where f 3 (N) —>0, as N — > <*> . 
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Now let the set {x m jf 1 in the ad-hoc scheme be chosen to satisfy 
(64). Then, from (62) and (64) [using the fact that P^->0, as 
K-+<x> (46)], we can choose N (and K = NR/H S ) sufficiently large 
so that 



P e S e, 



this is (9c). It remains to establish (9b). But from (64) with N suffi- 
ciently large, we can make 

R8 K + ^ + ^f^ + dog A) Pr IX" £ T*(N)\ + h(N) ^ ^ 

Then (63) and (56a) yield 

it 

which is (9b). Thus, (R, d) is achievable and the proof of the direct half 
of Theorem 2, i.e., (R C (R, is complete. It remains to prove Lemmas 
11 and 8. 

Proof of Lemma 11: We begin with some notation. For x E ?C N , let 

<•» = ft 'J2?' 

Also for a given set {x m }f l , let \ (m) (x h • • ■, XkJ be the error proba- 
bility that results when {x m | is used as a channel code for channel 
Q$° with prior probabilities (49b) when code word x m is transmitted 
and when maximum liklihood decoding is used. Thus, 

x = E E |r x(m)(Xl ' •••'**!>■ 

i = l m=(i-l)Mj+l Mi 

Further, with X, defined as above as the error probability for code 
Ci on Qjfflr, write \ t = \ M w(x«-i)M 2 +i, • ■ -, XiM t ) = *Mw(Ci), so that 
the dependence of X, on C, is explicit. We have 

M 

x = E <?A» = E qi^Mw(Ci). 

Finally, define 

<*>( Xl , ■ • ., xirj = Pr {X" $ fiW] + X + X 

= E E #■ [M(x m ) + X ( ->(xi, • • -, Xm 2 )] 

t = l m=(i-l)A/,+l -<«2 

+ E q&MwiCi). (66) 

Now suppose that the set \x m )f l is chosen at random, with each x m 
chosen independently from X N , with probability distribution p$>(x) 
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= IX^-i V*x(zn). We establish the lemma by showing that 2?$ ^ F 3 (N). 
Now observe that, from (59), (1/iV) log M x is bounded below I(X*, Y*). 
Also from (61), (1/JV) log M 2 is bound below I(X*;Z*). It follows 
from the standard random channel-coding theorem (see, for example, 
Ref. 1, Theorem 5.6.2) that £X (m) , E\ M w ^ / 4 (iV)-»0, as N^*>. 
Further, En = Pr {X* $ T* X (N)} ^ / 5 (JV) -^0, by (44). Thus, £<f> 
^ 2/ 4 (A0 + / 5 (A0 = h{N) ^0. Hence the lemma. 

Proof of Lemma 8: Here too we begin with some notation. Let p be a 
probability distribution on EC, and let d (p) be the mutual information 
between the input and output of channel Qmw when the input has 
distribution p. It is known (Ref. 1, Theorem 4.4.2) that d{p) is a 
concave function of p. Let n(x) be as in (65), and write (for any 
encoder-decoder) 

I/(X";Z") = 1/[X",az(X- v );Z'V] 

= 1 /[X-V; Z-v| M (X-v)] + ^ 7[m(X") ; Z"] 
= i £ Pr | M (X-v) = j}/(X" ; Z"| M (X") = j) 

*V ,-o 

+ i/[ M (XA');Zn (67) 

Now 

^Pr { M (X") = 1}/[X";Z"|m(X") = 1] 

^ (log A) Pr {X" $ r*(AT)}, (68) 
and 

i 7[ M (X-v) ; ZA-] g 1 ff[>(X")] ^ ~ (69) 

One term remains in (67). Using the memoryless property of channel 
Qtifo (Ref. 1, Theorem 4.2.1), we have 

1/(X";Z"| M = 0) gif/(I.;Z.| M = 0) 



-*&'<***'(*&»)' (70a) 



where p„ is the probability distribution for X„ given n = 0, i.e., 
for 1 ^ i ^ A, 

p n (i)= E *,...■ Pr {X-v = x|X^6 T*\. (70b) 

The last inequality in (70a) follows from the concavity of 3. From 
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(70b), 

P(i) - i E p.(0 = E Pr {X" - x|X e r*| t%5i. (71) 

iv „=i xgt* Jy 

The definition of T* (43) and eq. (71) yields 

\p(i) — P*x(i)\ ^8 N —>0, asJV— »<x. 
Since 3 (p) is a continuous function of p, we have 

l*(p) " *(pi)| £g(N)-+0, as7V-+<». (72) 

Substituting (72) into (70a), we obtain 

^Pr { M = 0}/(X^;Z^| M = 0) g ^(p* x ) + flrCJV) 

- I(X*;Z*)+g{N). (73) 

Finally, setting f t (N) = (1/N) + g(N), and substituting (68), (69), 
and (73) into (67) we have Lemma 8. 
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APPENDIX A 

The Data-Processing Theorem and Fano's Inequality 

Let U, V, U be discrete random variables that form a Markov 
chain. Then the data-processing theorem can be stated as 

H(U\V) ^ H(U\U), (74a) 

or equivalently 

I(U;V)^I(U;U). (74b) 

Inequality (74a) follows on writing 

H(U\V) = H(U\V, U) ( SH(U\t}), 

where step (a) follows from (4), and (b) from the fact that conditioning 
decreases entropy [Ref. 1, eq. (2.3.13)]. 
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Next, let U, V, U be a Markov chain as above, but now assume 
that U, U take values in 11(1 111 ^ <*>). Let 

X = p r \U 9* U). (75) 

Fano's inequality is 

H(U\V) ^ h(X) + Xlog (|Ol| - 1) ^ h(\) + Xlog 1 11. | . (76) 

To verify (76), define the random variable 

[0, U = U, 



and then write 



W.^-.l, 17-17, 



= H($\U) + H(U\U,$) 

^ H($) +H(U\U,$) 

= H($) + Pr {$ = 0\H(U\ U, $ = 0) 

+ Pr {$ = l}#(t/|tf, $ = 1) 

(b) 

= h(\) + (1 - \)-0 + Xff(C/| U,$ = 1) 

(c) 

^ h{\) + Xlog (|m| - 1) ^ h{\) + Xlog 1 11 1 , 

which is (76). Step (a) is (74a), and step (b) follows from the fact 
that, given * = 0, then U = U, so that H(U\ U, * = 0) = 0, and 
step (c) from the fact that, given 4> = 1, U takes one of the |*U| — 1 
values in 11 excluding U. 

A variation of Fano's inequality is the following. Let S A ', V, S K 
be a Markov chain where the coordinates of S K and S K take the 
values in the set S. Let 

P ek = Pr{S k 9*S k } (77a) 

and 

Pe = 4 £ Pek- (77b) 

A fc=l 
We will show that Fano's inequality implies 

iff(S*|7) ^ h(P e ) + P«log (|*|-1) =«(P.). (78) 

To verify (78), write 

1 (a) 1 JV 

(b) 1 AT (c) 

WIRE-TAP CHANNEL 1381 



which is (78). Step (a) is a standard inequality, step (b) follows on 
applying (76) to the Markov chain Sk, V, Sk, and step (c) from the 
concavity of 8 ( • ) . 

APPENDIX B 
Proof of Lemma 1 

(i) With no loss of generality, let 9C = {1, 2, ■ • •, A}. Any 
probability distribution px can be thought of as an A-vector 
P = i.Vh P2, ■ ■ •, Pa)- Since I(X; Y) is a continuous function of px, 
the set (P(R) is a compact subset of Euclidean A-space. Since I(X ; Y\Z) 
is also a continuous function of px, we conclude that I(X; Y\Z) has 
a maximum on (P(R). This is part (i). 

(ii) Let ^ Ri, R 2 ^ Cm, and ^ ^ 1. We must show that 

rpfli + (i - 0)fl 2 ] ^ er(Bj) + (i - »)r(B,). (79) 

For i = 1, 2, let p, E (P(Ri) achieve T(Ri). In other words, letting 
Xi, Yi, Zi correspond to p,, i = 1,2, then 

I(X it Y { ) ^ R u I(Xi, Yi\Zi) = T(Ri). (80) 

Now let the random variable X be defined as in Fig. 5. For i = 1,2, the 
box labeled "p" generates the random variable Xi that has probability 
distribution "p,." The switch takes upper position ("position 1") 
with probability 6 and the lower position ("position 2") with proba- 
bility 1—0. Let V denote the switch position. In the figure, V — 1. 
Assume that V, X iy X 2 are independent. As indicated in the figure, 
X = X it when V = i, i = 1, 2. Now 

I(X; Y) = H(Y) - H(Y\X) = H(Y) - H(Y\X, V) 
^ H(Y\V) - H(Y\X, V) = I(X; Y\V) 
= 6I(X;Y\V = 1) + (1 -6)I(X;Y\V = 2) 
= 6l(Xi; Yd + (1 - 9)I(X*; Y 2 ) 



(b) 



^ 6Ri + (1 - 6)R 2 . 



(81) 



P1 


x, 

\ V=1 X 










Qm 


Y 


Q w 




/ V=2 


P2 













Fig. 5 — Defining the random variable X. 
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Step (a) follows from the fact that V, X, Y is a Markov chain and 
(4). Step (b) follows from (80). Inequality (81) implies that the 
distribution defining X belongs to (P[0.Ri + (1 — 6)R{\. Thus, from 
the definition of r, 

r[M2i + (1 - 6)R{\ ^ I(X; Y\Z). (82) 

Continuing (82) and paralleling (81), we have 

TleBi + (1 - 6)R 2 ] = H(Y\Z) - H(Y\XZ) 
= H(Y\Z) - H{Y\XZV) 
= H(Y\ZV) - H(Y\XZV) 
= I(X; Y\ZV) = 01 (X; Y\Z, 7=1) 

+ (1 -d)I(X;Y\Z,V = 2) 
= 6l(X i; Y 1 \Z l ) + (1 - 6)1 {X,; Y 2 \Zo) 
= 6T(R 1 ) + (1 - 9)T(R 2 ), 

which is (79). This is part (it). 

{Hi) This part follows immediately from the definition of T(R) 
(10), since (P(R) is a nonincreasing set. 

(iv) Since T(R) is concave on [0, Cm~\, and nonincreasing, it must 
be continuous for ^ R < Cm- Thus, we need only verify the con- 
tinuity of T(R) at R = Cm- Let p be a probability distribution on 9C 
viewed as a vector in Euclidean A-space, as in the proof of part (i). 
Let £f(p) and £f(p) be the values of I(X; Y) and I(X; Y\Z), respec- 
tively, which correspond to p. d (p) and d (p) are- continuous functions 
of p. 

Now let {Rj}i be a monotone increasing sequence such that 
Rj —* Cm, and R ; ^ Cm- We must show that, as j — > =o , 

r(B y )-»r(Cjr). (83) 

Now from the monotonicity of T(R), limy-* T(R/) exists and 

limr(By) = T(C M )- (84) 

y-»« 

It remains to verify the reverse of ineq. (84). Let {py}f satisfy 

*(Pi) = Rj, H&) = T&i), (85) 

for 1 Si j < <*> . Since the set of probability A-vectors is compact, 
there exists a probability distribution p* on SC such that for some 
subsequence {py fc }t°=i 

lim p Jt = p*. 

fc-»co 
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It follows from the continuity of &{•), and (85) that 4(p*) ^ C M , so 
that p* G (P(Cjif). Therefore, from the continuity of d(-), and (85), 
we have 

lim T(Rj) = lim T(R Jk ) = lim <r(p yt ) = <r(p*) I r(CAf), (86) 

j-*oo fc-»oo A:-*m 

where step (a) follows from p*G(P(Cm)- Inequalities (84) and (86) 
yield (83) and part (iv) . 
(v) From (12), 

T(R) = sup II(X;Y)- I(X;Z)2 
p x e<P(.R) 

^ sup I(X;Y) ^ Cm, 
Pxe&iR) 

which is the first inequality in part (v). Also, using (12), 

T(C M )= sup U(X;Y) -I(X;Z)1 

^ sup [7(Z; Y) - CW] = C M - Cmw- (87) 

P*©P(C.v) 

Since Y{R) is nonincreasing, (87) yields T(R) ^ T(Cm) ^ Cm — Cmw, 
completing the proof of part (v). 

APPENDIX C 
Source with Memory 

In this appendix, we show how to modify our definitions and re- 
sults for a source with memory. We will take the source output 
sequence {Sk} to be a stationary, ergodic sequence (where Sk takes 
values in S) with entropy (as defined in Ref. 1, Section 3.5) of Hs- As 
in Section II, we continue to assume that |s| < °o, and that the 
source statistics are known. 

The channels Qm and Qw remain as in Section II, as does the defini- 
tion of an encoder-decoder with parameters N and K. The definition 
of P e also remains unchanged, but a new definition for A is necessary. 
To see this, let us suppose that the source was binary, i.e., S = (0, 1}, 
with entropy Hs, and with H(Si) > H S - Suppose also that the channel 
Qm is a noiseless binary channel, and that Qw has zero capacity. A 
possible encoder-decoder has K = N = 1 and takes Xi = Si. Such 
a scheme has P e = 0, but with A as defined in (7) given by 
A = H(Si) > H s . Using (9), this would lead us to accept the pair 
[_H S , H(Si)^\ as achievable, which would not be reasonable. Accord- 
ingly, we give a new definition of A. 

Let S K , Z N correspond to an encoder with parameters K, N as 
defined in Section II. Let S K (j), Z N (J), i = 1, 2, • ■ •, v, correspond to 
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the v successive repetitions of the encoding process. Then define the 
equivocation at the wire-tap as 

A= lim-^-i/[S«(l), ■..,S*(iO|Z*(l) l ■■■ l Z*(*)] 

- (88) 

= lim y-#(S*'|Z*')- 

With A as defined by (88), we define the sets (R and (R as in Section II. 
We claim that Theorem 2 remains valid. 

The proof of the converse-half of Theorem 2 given in Section IV 
goes over to the case where the source has memory with only trivial 
changes. Further, the results in Section V are all valid exactly for the 
source with memory. They yield that, if (R, d) satisfies (56), then we 
can for e > arbitrary find an encoder-decoder with parameters N, 
K, and P e which satisfies 

KH S 



N 



^ R - e, (89a) 



±H(S K \Z N ) ^d- €. (89c) 



Further, we can do this for arbitrarily large K. We show below that 
there exists a function f(K), K = 1, 2, • • •, such that for any code 
with parameters K, N 

A = lim i- ff(S*'|Z*') ^ ^tf(S*|Z") - f(K), (90) 

,_,« J\-V -ft 

where lim*-.* f(K) = 0, and f(K) depends only on the source statistics. 
Combining (90) with (89c), we have 

A ^ d - e - f(K). 

Since f(K) —* 0, we conclude that (R, d) is achievable. Tins is the 
direct half of Theorem 2. It remains to verify (90). 

First, imagine that the encoder-decoder begins operation infinitely 

far in the past. Let [SO"), Z(j)] be the (S K , Z K ) corresponding to the 

jth encoding operation, — °o < j < «. Thus, S** = (Si, •••,S K ») 

= [S(l), • • -, S(,)] and Z K > = [Z(l), • • •, Z«], v - 1, 2, • • •. Let 

Z* = [■ • •, Z(-l), Z(0), Z(+l), ■ ■ •]• Of course, 

ff(S*'|Z*') ^ ff(S*'|Z*). (91) 
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Further, 

ff(S*'|Z*) = tf[S(l), '--,S(v)\Z*l 



(a) 



= L^[S(i)|z*,s(i+i), •••,s(,)] 



J'=l 



00 



= E#[S(1)|Z*,S(2), •••,S(.;)] 



J-1 



(o) 



^ ,tf[S(l)|Z*,S(2), •••,S(,)]^ ,tf[S(l)|Z*,S'], (92) 

where S' = [S(2), S(3), • • • J. Step (a) is a standard identity, step (b) 
follows from the stationarity of the sequence { St | and the memoryless- 
ness of the channel Qmw, and step (c) follows from the fact that 
conditioning decreases entropy. Now, let 

S = S* = S(l), S' = [S(2),S(3), •••], 

Z = Z* = Z(l), Z' = [■ • •, Z(-l), Z(0), Z(+2), • • •]. 

Thus, (91) and (92) become 

-^(S^IZ"") ^ ±H(S\Z,Z',S') 
Kv A 



= -^[#(SZ|Z'S) - i/(Z|Z'S')] 
A 

= i[#(S|Z'S') + #(Z|SZ'S') - #(Z|Z'S')] 
= ^[H{S\S') + #(Z|S) -tf(Z|Z'S')] 
2;-i[tf(S|S') +H(Z\S) - #(Z)]. 



(93) 



Step (a) follows from the fact that Z', S', S and (S', Z'), S, Z are 
Markov chains, and (4). Now 



±H(S\S') =-^L #0S fc |S',& +1 , •••,5k) 

A A fc = i 

1 * 

A fc = l 



Also, 



H(S) - H s 



^f(K)-+0, asA"-*°o. 



(94) 



(95) 
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Substituting (95) and (94) into (93), we have 

±- v H(SK'\Z»>) |i[fl(S) + H(Z\S) - H(Zn ~ f(K) 

= iff(S|Z) - f(K), 
which is (90). 
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